TESTING FOR WHITE NOISE UNDER UNKNOWN 
DEPENDENCE AND ITS APPLICATIONS rn 
TO GOODNESS-OF-FIT FOR TIME SERIES MODELS U 

By Xiaofeng Shao 
June 29, 2009 
University of Illinois at Urbana- Champaign 

Testing for white noise has been well studied in the literature of econometrics 
and statistics. For most of the proposed test statistics, such as the well-known 
Box-Pierce's test statistic with fixed lag truncation number, the asymptotic null 
distributions are obtained under independent and identically distributed assump- 
tions and may not be valid for the dependent white noise. Due to recent popu- 
larity of conditional heteroscedastic models (e.g. GARCH models), which imply 
nonlinear dependence with zero autocorrelation, there is a need to understand the 
asymptotic properties of the existing test statistics under unknown dependence. 
In this paper, we showed that the asymptotic null distribution of Box-Pierce's test 
statistic with general weights still holds under unknown weak dependence so long 
as the lag truncation number grows at an appropriate rate with increasing sample 
size. Further applications to diagnostic checking of the ARMA and FARIMA mod- 
els with dependent white noise errors are also addressed. Our results go beyond 
earlier ones by allowing non-Gaussian and conditional heteroscedastic errors in the 
ARMA and FARIMA models and provide theoretical support for some empirical 
findings reported in the literature. 

1 Introduction 

A fundamental problem in time series analysis is to test for white noise (or lack 
of serial correlation). For a zero-mean stationary process {ut} with finite vari- 
ance fj^ = var(ut), denote its covariance and correlation functions by Ru{k) = 
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cov{ut,ut+k) and Pu{k) = Ru{k)/a'^,k G Z respectively. Then the null and alter- 
native hypothesis are 

Hq : PuU) = for all j / 0, and Hi : / for some j / 0. 

Let /m(A) = {2tt)~^ '^k& Pu{k)e-'^^'^ be the normalized spectral density function of 
ut- The equivalent frequency domain expressions to Hq and Hi are 

Hq : = 7^, G [-7r,7r) and i^i : / for some G [-7r,7r). 

zvr zvr 

In statistical modeling, diagnostic checking is an integrable part of model building. 
A common way of testing the adequacy of the proposed model is by checking the 
assumption of white noise residuals. Systematic departure from this assumption 
implies the inadequacy of the fitted model. Thus testing for white noise is an 
important research topic and it has been extensively studied in the literature of 
econometrics and statistics. 

The methodologies can be roughly divided into two categories: time domain 
tests and frequency domain tests. In the time domain, the most popular test 
is probably Box and Pierce's (1970) (BP) portmanteau test, which admits the 
following form: 

m 

Qn = ^pl(.h), 
J=l 

where m is the so-called lag truncation number [see Hong (1996)] and is (typically) 
assumed to be fixed. The empirical autocorrelation Puii), is defined as Pu{j) = 
Ruij)/RuiO) with RuU) = n-^ Yl'l^j\+i{ut-u){ut-\.j\-u), where n = n"^ Yh=i Ut- 
Under the assumption that {ut}tez are independent and identically distributed 
(iid), it can be shown that nQn -^D X^("i)) where " " stands for convergence 
in distribution. If {ut}^^i are replaced by the residuals from a well specified model, 
then the limiting distribution is still but the degree of freedom is reduced to m — 
m', where m' is the number of parameters in the model. In the frequency domain, 
Bartlett (1955) proposed test statistics based on the famous Up and Tp processes 
and a rigorous theoretical treatment of their limiting distributions was provided by 
Grenander and Rosenblatt (1957). Other contributions to the frequency domain 
tests can be found in Durlauf (1991) and Deo (2000) among others. 

In the literature, when deriving the asymptotic null distribution of the test 
statistic, most earlier works assume Gaussianity and thus lack of correlation is 
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equivalent to independence. Lately there has been work that stress the distinction 
between lack of correlation and independence. The main reason is that the asymp- 
totic null distributions of the above-mentioned test statistics were obtained under 
iid assumptions on u^, and may not hold in the presence of nonlinear dependence, 
such as conditional heteroscedasticity. For example, Romano and Thombs (1996) 
showed that the BP statistic with approximation can lead to misleading infer- 
ences when the time series is uncorrelated but dependent. Francq et al. (2005) 
also demonstrated that the BP test applied to the residuals of an ARMA model 
with uncorrelated but dependent errors performs poorly without suitable modi- 
fications. Various methods have been proposed to account for the dependence; 
see for example, Romano and Thombs (1996), Lobato et al. (2002), Prancq et al. 
(2005) and Horowitz et al. (2006) among others. At this point, it seems natural to 
ask: "Does there exist a test statistic whose asymptotic null distribution is robust 
to the unknown dependence of n^". We shall give an affirmative answer in this 
paper. 

In a seminal paper. Hong (1996) proposed several test statistics, which measure 
the distance between a kernel-based spectral density estimator and the spectral 
density of the noise under the null hypothesis. Let 

n-l 

fn{w) = {27r)-' K{j/mn)pu{j)e'''" 

j=-n+l 

be the lag window estimator of the normalized spectral density function [Priest- 
ley (1981)], where K{-) is a nonnegative symmetric kernel function, is the 
bandwidth that depends on the sample size. With the quadratic distance, Hong's 
statistic is expressed as 

J —TT 

or equivalently, 

n 

Tn = Y,K\j/mn)pl{j). 

i=i 

It is worth noting that BP statistic can be regarded as a special case of Hong's, 
where K(-) is taken to be the truncated kernel K{x) = l{\x\ < 1). Under the iid 
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assumptions on ut and l/mn + mn/n — > 0, Hong (1996) established the asymptotic 
nuh distribution of T„, i.e. 

where C7„(K) = E"=i'(l " j HK^j /mr,), D^{K) = E"=i (1 - j/n){l - {j + 
1) /n)K'^{j /rrin) and A^(0, 1) stands for the standard normal distribution. Under 
some additional assumptions on K{-) and m„, ([1]) holds with Cn{K) and Dn{K) 
replaced by mnC{K) and mnD{K) respectively, where C{K) = K'^{x)dx and 
D{K) = K'^{x)dx. Later Hong and Lee (2003) established the above result 
assuming ut to be martingale differences with conditional heteroscedasticity of un- 
known form. One of the major contributions of this paper is to show that Hong's 
test statistic is still asymptotically valid under general white noise assumption on 
Ut- Further, we establish that when replacing ut by ut, the residuals from the 
ARM A model with uncorrelated and dependent errors, the asymptotic null distri- 
bution of T„ still holds. Our assumptions and results differ from those in Francq 
et al. (2005) in that m is held fixed in their asymptotic distributional theory, 
while m = m{n) grows with the sample size n in our setting. From a theoretical 
standpoint, the fourth cumulant of ut plays a non-negligible role in the asymptotic 
distribution of Qn when m is fixed, whereas it turns out to be asymptotically neg- 
ligible in Tn when m„ — > oo. So in the latter case, the asymptotic null distribution 
does not change under dependent white noise, i.e. the dependence is automati- 
cally accounted for if m and n both grow to infinity. The theoretical finding is also 
consistent with the empirical results reported in the simulation studies of Francq 
et al. (2005), where the empirical size of the BP test is seen to be reasonably close 
to the nominal one when n is large and m is relatively large compared to n. 

Recently, there has been considerable attention paid to the goodness-of-fit for 
long memory time series. Here we only mention some representative works. Ex- 
tending Hong's (1996) idea, Chen and Deo (2004a) proposed a generalized port- 
manteau test based on the discrete spectral average estimator and obtained the 
asymptotic null distribution for Gaussian long memory time series. Following the 
early work by Bartlett (1955), Delgado et al. (2005) studied Bartlett's Tp process 
with estimated parameters and a martingale transform approach was used to make 
the null distribution asymptotically distribution-free. In a related work. Hidalgo 
and Kreiss (2006) proposed to use bootstrap methods in the frequency domain 
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to approximate the sampling distribution of Bartlett's Tp statistic with estimated 
parameters. In these two articles, the asymptotic distributional theory heavily 
relies on the assumption that the noise processes are conditionally homoscedastic 
martingale differences. 

In the last decade, the FARIMA (fractional autoregressive integrated moving 
average) models with GARCH errors have been widely used in the modeling litera- 
ture [cf. Lien and Tse (1999), Elek and Markus (2004), Koopman et al. (2007)]. In 
the modehng stage of a FARIMA-GARCH model, it is customary to fit a FARIMA 
model first and then fit a GARCH model to the residuals. It is crucial to specify 
the FARIMA model correctly since the model misspecification of the conditional 
mean often leads to the misspecification of the GARCH model; see Lumsdaine and 
Ng (1999). Thus diagnostic checking of FARIMA models with unknown GARCH 
errors is a very important issue. Note that Ling and Li (1997) and Li and Li 
(2008) have studied the BP type tests for FARIMA-GARCH models assuming a 
parametric form for the GARCH model. To the best of our knowledge, there seems 
no diagnostic checking methodology known or theoretically justified to work for 
long memory time series models with nonparametric conditionally heteroscedas- 
tic martingale difference errors. In this article, we shall fill this gap by proving 
asymptotic validity of Hong's test statistic when we replace the unobserved errors 
by the estimated counterpart from a FARIMA model. 

We now introduce some notation. For a column vector x = (xi, • • • , Xq)' £ M'^, 
let \x\ = iY^'j^ix'^y/'^ . For a random vector ^, write ^ G {p > 0) if : = 
[E(|^|*')]-^/*' < CO and let || • || = || • ||2- For ^ G £^ define projection operators 
^feC = ^{^\^k) - E(^|J"fe_i), k eZ, where -Ffc = (• • . ,efc-i,efc) with {et}tez being 
iid random variables. Let C > denote a generic constant which may vary from 
line to line; denote by -^p convergence in probability. The symbols Op(l) and 
Op(l) signify being bounded in probability and convergence to zero in probability 
respectively. The paper is structured as follows. In Section [2] we introduce our 
assumptions on ut and establish the asymptotic distributions of T„, under the null 
and alternative hypothesis. Section [3] discusses the case when ut are not directly 
observable. Here we consider the ARMA and FARIMA models with dependent 
white noise errors in Section [3. II and Section [3?2] respectively. Section U] concludes. 
Proofs are gathered in Section [5l 
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2 When ut is observable 



Suitable structural assumptions on the process (ut) are certainly needed. Through- 
out, we assume that (ut) is a mean zero stationary causal process of the form 

Ut = F(- • • ,et-i,et), (2) 

where et are iid random variables, and F is a measurable function for which ut 
is well defined. Further we assume ut satisfies the geometric- moment contraction 
(GMC) condition [Hsing and Wu (2004), Shao and Wu (2007), Wu and Shao 

(2004) ]. Let (e'fc)fcgz be an iid copy of (efc)fcez; let < = F(- • • ,e'_i,e^,ei, • • • ,e„) 
be a coupled version of u„. We say that Un is GMC(a), a > 0, if there exist C > 
and p = p{a) G (0, 1) such that 

IE(|n„-<r)<Cp", nGN. (3) 

The property ([3]) indicates that the process {un} forgets its past exponentially fast, 
and it can be verified for many nonlinear time series models, such as threshold 
model, bilinear model and various forms of GARCH models; see Wu and Min 

(2005) and Shao and Wu (2007). 

Besides conditional heteroscedastic models, which imply uncorrelation due to 
the martingale difference structure, there are a few commonly used models [see 
Lobato et al. (2002)] that are uncorrelated but are not martingale differences. We 
shall show that these models satisfy GMC property under appropriate assump- 
tions. 

Example 2.1. Bilinear model [Granger and Anderson (1978)]: 

ut = et + bet-iUt-2, 

where et are iid N{0,a'^) and \b\ < 1. According to Example 5.3 in Shao and Wu 
(2007), Ut is GMC (a), a > 1 if 



E 




< 1, 



where for a p x p matrix A, \A\a = sup^^o l^-^ja/l-zla, a > 1, is the matrix norm 
induced by the vector norm [-zIq, = (X]j=i l-^j l")"'^''"■ 
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Example 2.2. All-Pass ARMA(1,1) model [Breidt et al. (2001)]: 

ut = (put-i + et - (p'-^et-i 

where |i;^>| < 1 and et ~ iid{0, a^). Note that ut = et + J2'jLi{4^ ~ 4^~'^)^t~j- Since 
\^\ < 1, Ut is GMC(a) if et G In view of Theorem 5.2 in Shao and Wu (2007), 
the all-pass ARMA(p,p) model also satisfies GMC(a) provided that et £ C". 

Example 2.3. Nonlinear moving average model [Granger and Terasvirta (1993)]: 

Ut = f5et-iet-2 + et, 

where et ~ iid{0,a'^) and /3 G M. It is easily seen that ut is GMC(a) if et E 

To obtain the asymptotic distribution of T„, the following assumption is made 
on the kernel function K{-) and is satisfied by several commonly- used kernels in 
spectral analysis, such as Bartlett, Parzen and Tukey kernels (see Priestley (1981), 
p 446-447). 

Assumption 2.1. Assume the kernel function X : M — > [—1, 1] has compact support 
on [—1, 1], is differentiahle except at a finite number of points and symmetric with 
K{{)) = 1, max^g[_i_i] \K{x)\ = Kq <oo. 

The assumption that K{-) has compact support can presumably be relaxed at 
the expense of longer and more technical proof; see Chen and Deo (2004a). Here 
we decide to retain it to avoid more technical complications. 

Theorem 2.1. Suppose Assumption \2. 1\ and (0) holds with a = 8. Assume logn = 
o{mn) and nin = o(n^/^). Under Hq, we have 

Remark 2.1. As pointed out by a referee, the 8-th moment condition on ut is fairly 
strong and it excludes some interesting GARCH models, such as the IGARCH 
model. In addition, the permissible parameter space for the regular GARCH(r, s) 
model is quite small under the 8-th moment assumption. At this point, we are un- 
able to relax this assumption as it seems necessary in our technical argument. Nev- 
ertheless, the result above suggests that the asymptotic null distribution of Hong's 



7 



(1996) statistic is unaffected by unknown (weak) dependence. From a technical 
point of view, the asymptotic null distribution of the BP statistic depends on the 
fourth cumulants of Ut since the number of lags m is fixed. In contrast, for Hong's 
statistic, as m„ — > oo, the fourth cumulant effect appears to be asymptotically 
negligible. For a fixed m, our result in Theorem 12.11 is not applicable. 

The condition on the bandwidth is less restrictive than it looks. I am not aware 
of any theoretical results on the optimal bandwidth choice for r„ in the hypothesis 
testing context. In terms of estimating the spectral density function, the optimal 
bandwidth is m„ = Cn^/^ if the kernel (e.g. Parzen kernel) is quadratic around 
zero, and m„ = Cin}/'^ if the kernel (e.g. Bartlett kernel) is linear around zero. Note 
that the problem of testing for white noise bears some resemblance to testing lack 
of fit (or specification testing) in the nonparametric regression context. The latter 
problem has been well studied in the literature and the data-driven bandwidth 
choice for the smoothing type test has been addressed in Horowitz and Spokoiny 
(2001) and Guerre and Lavergne (2005) among others. 

For the optimal choice of the kernel function, we refer the reader to Hong 
(1996) for more details. The consistency of is stated in the following theorem. 

Theorem 2.2. Suppose Assumption \2. 1\ and ^ holds with a = 8. Assume l/m„-|- 
m-n/n 0. Under Hi, we have 

A/m^ / nTr, — m. 



n 



Proof of Theorem 12.21 It follows from the argument in the proof of Theorem 6 
of Hong (1996) by noting that Ru{j) < Cr^ for some r G [0, 1) and the absolute 
summability of the fourth cumulants under GMC(4) [See Wu and Shao (2004), 
Proposition 2]. We omit the details. <0> 

Remark 2.2. In a related work, Chen and Deo (2006) considered the variance ratio 
statistic to test for white noise based on the first differenced series and proved that 
when the horizon k satisfies 1/k + k/n = o(l), the asymptotic null distribution of 
the variance ratio statistic is also robust to conditional heteroscedasticity of un- 
known form. Their result is akin to ours, in that the asymptotic null distribution 
of the test statistic is nuisance parameter free and the horizon k in variance ratio 
statistic plays a similar role as our bandwidth m„. However, in their conditions 



8 



(Al)-(A6), the white noise process is assumed to be a sequence of martingale differ- 
ences with additional regularity conditions imposed on the higher order moments 
(up to 8th); compare Deo (2000). Under our framework, the white noise process 
does not have to be martingale difference under the null. This has some practical 
implications since there are nonlinear time series models that are uncorrelated but 
are not martingale differences, as shown in Examples 12.1112.31 Prom a technical 
point of view, the relaxation of the martingale difference assumption, which was 
imposed in Hong and Lee (2003) and Chen and Deo (2006), is a very nontrivial 
step and is made feasible with the novel martingale approximation techniques; see 
Appendix for more discussions. 

Remark 2.3. For the BP test statistic, K{x) = l{\x\ < 1) and C{K) = D{K) = 1. 
Thus the statement (H) reduces to {nYjfj^i plij) - m.„}/V2m;^ -^d iV(0,l). In 
the implementation of the BP test, we use the critical values based on x^{mn) and 
compare it with the realized value of n'^Y=i PuU)^ whereas in Hong's test, the 
critical values are based on the standard normal distribution. Loosely speaking, 
the two procedures are asymptotically equivalent, since as m„ oo, the central 
limit theorem implies x^('^n) ~ A^("in;2m„). This suggests that the use of BP 
test is valid in the presence of unknown weak dependence when is relatively 
large compared to n. 

3 When Ut is unobservable 

In practice, the errors {ut}t=i,2,- ,n are often unobservable as a part of the model, 
but can be estimated. Hong (1996) studied the residuals from a linear dynamic 
model that includes both lagged dependent variables and exogenous variables. In 
principle, our results can be extended to the residuals from any parametric time 
series models with uncorrelated errors, including the setup studied by Hong (1996). 
Instead of pursuing full generality, we shall treat the residuals from ARMA and 
FARIMA models in Sections 13.11 and 13.21 respectively. This is motivated by the 
recent interests on the ARMA models with dependent white noise errors [cf. Francq 
and Zakolan (2005), Francq et al. (2005) and the references therein] and goodness- 
of-fit for long memory time series models [see Section 13.21 for more references] . 
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3.1 



ARMA model 



Consider a stationary autoregressive and moving average (ARMA) time series 
generated by 



where B is the backward shift operator, {ut} is a sequence of uncorrelated random 
variables and A = (ai, • • • , Op, /3i, • • • , /3<j)' is an unknown parameter vector. Let 
4>a{z) = 1 — aiz — ■ ■ ■ — OpZ^ and i^Aiz) = 1 + (3iz + • • • + Pqz"^ be AR and MA 
polynomials respectively. Denote by Aq = (aio,-- - ,apo, (3io, - ■ ■ ,/3go)' the true 
value of A and assume that Aq is an interior point of the set 

= {A G M''^''''^; the roots of polynomials (pAiz) and iIja{z) have moduli > 1 + 5} 

for some 6 > 0. Following Francq et al. (2005), we call ([5]) a weak ARMA model 
if (ut) is only uncorrelated, a semi-strong ARMA model if (ut) is a martingale 
difference, and a strong ARMA model if (ut) is an iid sequence. 

Denote by A„ = («!„,••• ,apn, Pin, • " , Pqn)' the estimator of A. Then the 
residuals u^, t = l,2,-- - ,n are usually obtained by the following recursion 

Ut = Xt — ainXt^l — • • • — ckpnXt-p — Pin.Ut-l — • • • — ^qnUt^q, t = 1, 2, • • • , n, 

where the initial values (Xo,X_i,--- ^Xi^p)' = [uq,--- ^ui-q)' = 0. Following 
Francq et al. (2005), we test 

Hq : {Xt) has an ARMA(p, q) representation ([5]) 

against the alternative 

Hi : {Xt) does not admit an ARMA representation, or admits an ARMA(p',q') 
representation with p' > p or q' > q. 

If p and q are correctly specified, we would expect the estimated residuals behave 
like a white noise sequence under Hq. The following theorem states the asymptotic 
null distribution of the test statistic Ti„ = X]j=i A'^(i/"^n)p|(j)- 

Theorem 3.1. Suppose the assumptions in Theorem \2. 1\ hold. Assume A„ — Aq = 
Op{n~'^/'^). Then under Hq, 



(1 - aiB 



apBP)Xt = {l + piB + --- + PqB^)ut, 



(5) 



nTin - mnC{K) 
{2mnD{K))V^ 



D A(0,1). 
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The proof of Theorem 13 . 1 1 follows the argument used in the proof of Theorem l3.2l 
below and is simpler. We omit the details. Note that as a common feature of 
smoothing-type test, the use of the residuals {ut} in place of the true unobservable 
errors {ut} has no impact on the limiting distribution. 

Remark 3.1. In the simulation studies of Prancq et al. (2005), it can be seen that 
when m is large relative to n, the level of the BP test is reasonably close to the 
nominal one. Here our result provides theoretical support for this phenomenon 
since if we let K to be the truncated kernel, the resulting test statistic is exactly 
the same as BP's. As commented in Remark 12.31 the difference between the use 
of the x^'based critical values as done in BP test, and the use of the A^(0, 1)- 
based critical values for Hong's test is asymptotically negligible since the number 
of model parameters (i.e. p + q) is fixed and m„ oo. Therefore, it seems fair to 
say that the use of BP test is still justified when the lag truncation number m is 
large, as the unknown dependence in ut does not kick in asymptotically. 

As mentioned in Francq et al. (2005), weak ARMA models can arise from 
various situations, such as transformation of strong ARMA processes, causal rep- 
resentation of noncausal ARMA processes and nonlinear processes. In the sequel, 
we demonstrate that the GMC condition for the noise process in the weak ARMA 
respresentation can be verified for the two leading examples in Francq et al. (2005). 

Example 3.1. Consider the process 

Xt - aXt^i = et- bet-^i, a^h^ (-1, 1), 

where et are iid random variables with E(et) = and et € a > 1. Let Yt = ^24. 
Then Yt - a^Yt-x = it = ut- 9ut-i, where 6 G (-1, 1), = £2* + (a - 6)e2i-i - 
abe2t-2, Ut is white noise and ut = Ru + R2t + ^Ct-ii where Ru = —ahe2t-2 + 
0'^e2t-A + £2* + (a - h)e2t-i + 0'^[{a - b)e2t-5 - abe2t-6] and i?2i = J2i>3 ^'ut-i- It 
is easily seen that and Ru satisfy GMC(a). By Theorem 5.2 in Shao and Wu 
(2007), R2t also satisfies GMC (a). Therefore, m is GMC (a). 

Example 3.2. Consider the process 

Xt = et- 4>£t-i, \4>\ > 1- 

Let Ut = ^iZo4'~^^t-i- Then Xt admits the causal MA(1) representation: Xt = 
Ut — (p^^ut^i- Since Xt is GMC(a), ut is also GMC(a) by Theorem 5.2 in Shao 
and Wu (2007). 
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Remark 3.2. To study the local power of Ti„, we follow Hong (1996) and define 
the local alternative Hin : fun{w) = {^tt)^^ + ang{w) for w E [— 7r,7r], where 
a„ = 0(1). The function g is symmetric, 27r-periodic and satisfies J^^ g{w)dw = 0, 
which ensures that is a valid normalized spectral density function for large re. 
Let fJ,{K) = 27r J^^ g'^ {w)dw / {2D{K))^/'^ . It can be shown that under Han with 
an = rrin^ /v}/"^, 

nTin-mnC{K) fT^\ ,\ fa\ 

{2mnD{K)Yl^ ^DiV(/^W,l) (6) 

provided that A„ — Aq = Op(re~^/^) and the assumptions in Theorem 12.11 hold. 
Since the proof basically repeats the argument in the proof of Hong's (1996) The- 
orem 4, we omit the details. It is worth mentioning that the above asymptotic 
distribution ([6j) under the local alternative still holds for T„, whereas a similar 
result for T2n [see Section [3.21 for the definition] in the long memory case may still 
hold but the proof seems tedious and is thus not pursued. Compared to the Box- 
Pierce test with a fixed m, Hong's test is locally less powerful in that Box-Pierce's 
test has nontrivial power against the local alternative of order n~^/^. On the 
other hand, Box-Pierce's test only has trivial power against non-zero correlations 
at lags beyond m, whereas Hong's test is able to detect non-zero correlations at 
any nonzero lags asymptotically. 

3.2 FARIMA model 

In this subsection, we extend our result to the goodness-of-fit problem for long 
memory time series. A commonly used model in the long memory time series 
literature is the FARIMA model: 

(1 - BYct>K{B)Yt = ^k{B)uu (7) 

where d G (0,1/2) is the long memory parameter. Let 6 = (d,A')' and denote 
by Oq = (^O)^o)' its true value. Assume that ^0 lies in the interior of Qs = 
[Ai, A2] X where < Ai < A2 < 1/2. 

Testing goodness of fit for short/long memory time series models has attracted 
a lot of attention recently. Most tests were constructed in the frequency domain 
and they can be roughly categorized into two types: spectral density based test 
and spectral distribution function based test. Tests developed by Hong (1996), 
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Paparoditis (2000), Chen and Deo (2004a) are of the first type and they usually 
involve a smoothing parameter and have trivial power against n~^/^ local alterna- 
tives. The advantage of this type of tests is that the asymptotic null distributions 
are free of nuisance parameters. For the second type, see Beran (1992), Chen 
and Romano (1999), Delgado et al. (2005) and Hidalgo and Kreiss (2006), among 
others. Typically, the tests of this type avoid the issue of choosing the smoothing 
parameter and they can distinguish the alternatives within n~ ^/^-neighborhoods of 
the null model. However, a disadvantage associated with this kind of tests is that 
the asymptotic null distributions often depend on the underlying data generating 
mechanism and are not asymptotically distribution-free. The martingale trans- 
form method [see Delgado et al. (2005)] and the bootstrap approach [Chen and 
Romano (1999), Hidalgo and Kreiss (2006)] have been utilized to make the tests 
practically usable. So far, the tests proposed by Chen and Deo (2004a), Delgado 
et al. (2005) and Hidalgo and Kreiss (2006) have been justified to work for long 
memory time series models. However, they assumed either Gaussian processes or 
linear processes with the noise processes being conditionally homoscedastic mar- 
tingale differences, which exclude interesting models, such as FARIMA models 
with unknown GARCH errors. 

Since do £ (0, 1/2), the process Yt is invertible. We have the following autore- 
gressive representation 

oo 

ut = ^efc(6'o)l^t_fc. 

k=0 

Given the observations lf,t = 1,2, ••• ,n, we follow Beran (1995) and form the 
residuals by 

t-i 

^t = Yl (^j0n)Yt-j, t = 1, 2, • • • , n, (8) 

j=0 

where 9n is an estimator of 9. Similar to the ARMA case, the null and alternative 
hypothesis are 

Hq : (Yt) has an FARIMA(p, d, q) representation 

and 

Hi : (Yt) does not admit an FARIMA representation, or admits an FARIMA(p', d, g') 
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representation with p' > p or q' > q. 

The test statistic is T2„ = J2]^=i -^^(j/"^n)plO)> where {ut}t=i from ([8|). 

Theorem 3.2. Suppose that the assumptions in Theorem \2. 1\ hold. Assume On — 
Oq = Op{n~^/'^). Then under Hq, we have 

nT2n - mnC{K) 



iV(0,l) 



The result presented above is a new contribution to the hterature, even for 
the model ([7]) with iid errors. Here we can take the Whittle pseudo-maximum 
likelihood estimator as On- The root-n asymptotic normality of Whittle estima- 
tor for long memory time series models with general white noise errors has been 
estabhshed by Hosoya (1997) and Shao (2010). 

Remark 3.3. Hong's (1996) statistic has been reformulated in the discrete form by 
Chen and Deo (2004a), who showed asymptotic equivalence of the two statistics 
for Gaussian long memory time series. Note that the applicability of Chen and 
Deo's (2004a) test statistic has only been proved for the Gaussian case. The latter 
authors conjectured that their assumptions can be relaxed to allow long memory 
linear processes with iid innovations. The work presented here partially solves 
their conjecture and our results even allow for dependent innovations. 

A limitation of our theory is that we need to assume the mean of Yt is known. 
In practice, if the mean is unknown, we need to modify our ut [cf. dH])] by replacing 
Yt with Yt — Yn, where Yn = n^^ X]"=i It turns out that our technical argu- 
ments are no longer valid with this modification except for the case do £ (0, 1/4) 
with additional restrictions on m„. The main reason is that the sample mean of a 
long memory time series converges to the population mean relatively slowly at the 
rate of n^^/'^~'^''\ The larger do is, the slower it becomes. When do S [1/4,1/2), 
the effect of mean adjustment becomes asymptotically non-negligible. As pointed 
out by a referee, Chen and Deo's (2004a) frequency domain test statistic is mean 
invariant, so no mean adjustment is needed. It might be possible to extend the 
theory presented in Chen and Deo (2004a) directly to the case of dependent inno- 
vations, but such an extension seems very challenging and is beyond the scope of 
this paper. In the short memory case, i.e. do = 0, the mean adjustment does not 
affect the asymptotic null distribution of the test statistic Tin- In other words, 
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Theorem 13. II still holds if we use the mean adjusted residuals in the calculation of 

Tin- 

Remark 3.4. It seems natural to ask if a central limit theorem for statistics based 
on Bartlett's Tp process can be obtained under the GMC conditions on the errors. 
Although it might be possible to obtain a non-pivotal asymptotic null distribution 
under GMC conditions, the martingale transformation method used in Delgado et 
al. (2005) and the frequency domain bootstrap approach in Hidalgo and Kreiss 
(2006) may no longer be able to take care of the estimation effect for the long 
memory model with unknown conditional heteroscedastic errors. The main reason 
is that the validity of both approaches rely on the assumption that the fourth 
order spectrum of the innovation sequence is a constant, which happens to be true 
for conditional homoscedastic martingale differences [cf. Shao (2010)] . In the case 
of conditional heteroscedastic errors, I am not aware of any feasible tests based 
on Bartlett's Tp process. Further study along this direction would be certainly 
interesting. 

4 Conclusions 

In this paper, we showed that Hong's (1996) test is robust to conditional het- 
eroscedasticity of unknown form in large sample theory and is applicable to a 
large class of dependent white noise series. Further, when applied to the residuals 
from short/long memory time series models, the asymptotical null distribution is 
still valid. The main focus of this paper is on the theoretical aspect, although the 
empirical performance is also very important. The finite sample performance of 
Hong's test statistic has been examined by Hong (1996) and Chen and Deo (2004b) 
among others to assess the goodness of fit of time series models with iid errors. 
It was found that the sampling distribution of the test statistic is right-skewed, 
and the size distortion can presumably be reduced by adopting a power transfor- 
mation method [Chen and Deo (2004b)] or frequency domain bootstrap approach 
[Paparoditis (2000)]. The performance of the afore-mentioned test statistics along 
with size-correction devices have yet to be examined for time series models with 
dependent errors. An in-depth study is certainly worthwhile, and will be pursued 
in a separate work. 
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5 Technical Appendices 

Throughout the appendices, ut is assumed to be an uncorrelated stationary se- 
quence with the representation ([2]). For the convenience of notation, let knj = 
K{j/mn). Denote by Zjt = utUt-j and Dj^k = Yl'u=k'^k{Zjt)- Note that for 
each j G N, Djj^ is a sequence of stationary and ergodic martingale differences. 
For a, 6 G M, denote by a V 6 = max(a,6) and a f\b = min(a,6). Let Tl = 
{si,--- ,ej) and = (• • • , e'_;^, Gq, ei, • • • ,et), i G N. For X G C^, denote 
by ViX = E{X\J^l) -K{X\J^U). Let < = F(- • • , e_i, e'q, ei, • • • ,6^), k e N. 
Denote hy 6a{k) = \\uk — ul\\a, k G N, a > 1 the physical dependence mea- 
sure introduced by Wu (2005). According to Wu (2007), we have llPo-^ifclU < 
C{62a{k) + 62a{k - j)l{k > j)) if Ut G and 5a{k) < Cr^ for some r G (0, 1) 
provided that ut is GMC(a), a > 1. 

One of major technical contributions of this paper is to replace the martingale 
difference assumption in Hong and Lee (2003) by the GMC condition under the 
white noise null hypothesis. This is achieved by approximating the double array se- 
quence X]r=j+i ^jt using its martingale counterpart ^^^j^i Dj^t ^ov j = 1, ■ ■ ■ , m„. 
Note that the martingale approximation for the single array sequence ut has been 
weh studied [cf. Hsing and Wu (2004), Wu and Woodroofe (2004), Wu and Shao 
(2007) among others], but the techniques there are not directly applicable. The 



19 



major difficulty is that in our setting tlie martingale approximation error has to 
be bounded uniformly in j = 1, • • • , m„ and the application of martingale central 
limit theorem after martingale approximation requires very delicate analysis due 
to the presence of dependence. 

We separate the proofs of Theorem 12.11 and Theorem 13.21 along with necessary 
lemmas into Appendices A and B respectively. 

5.1 Appendix A 

Let 9j,r,a = W'PoZjrWa, a>l and @j,n,a = YlT=n^j,r,oi- The following lemma is an 
extension of Theorem 1 (ii) in Wu (2007). Since the proof basically repeats that 
in Wu (2007), we omit the details. 

Lemma 5.1. Assume ut G C^" , a > 2. For < an < bn < n, we have 



The part (a) of the lemma below states the variance and covariances of the 
approximating martingale difference Dj^k and may be of its independent interest. 

Lemma 5.2. Assume that ut is GMC{8). (a) For j > 0, we have 



andE{Dj^kDj>^k) = {'^/2)Yjf.^^{cum{uQ,U-j,Uk,Uk-j') + cum{uo,U-j' ,Uk,Uk-j)} 



when j / / > 0. (b) Let D'., = TT=k'P'M'^'t-j)- Then - D'.^^U < 

C(p^"n(A: > j) + \j - k\l{k < j)). (c) Let D^^u = n^jAi^k, ■ ■ ■ ,ek-i+i)), I G N. 



Then ||-Dj,fc — DjA\/i < C(p' > j) + \ j — /|1(/ < j)). Here the positive constant 
C appeared in (b) and (c) is independent of j. 

Proof of Lemma [512] (a) It follows that when j = j' > 0, 




j,k,a' 





oo 




oo 



kj^O,k&Z 




k^O,k&Z 
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and when j ^ j' > 0, 

= (1/4) ^{cov(ntnt_j + utUt^f ,Ut+kUt+k-j + Ut+kUt+k-j') 

-COv{utUt-j - UtUt^j>,Ut+kUt+k-j - Ut+kUt+k~j')} 
= (1/2) '^{COV {utUt-j , Ut+kUt+k-j') + COV {utUt-j' ,Ut+kUt+k-j)} 

kez 

= (1/2) ^{cum(tio, u-j,uk, Uk-j') + cum('Uo, U-j',Uk,Uk-j)}. 

k£Z 

(b) In general, for Vt = J(- • • ,et-i,et), we have = E(y/|J^fc) when t > k. 

So for Q > 1, 

\mvt\Tk) - mt\H)\\a < wmm - mt\m\a + \mvt\H) - mt\H)\\a 

< 2\\Vt-V^\\a, 

which imphes that 

\\rkVt-v'kVt,\u<qvt-v^\\a. (9) 

Note that Dj^k = Y^tlk'^k{utUt~j) and D'jj^ = Ylt^k'^'ki'^'Wt-j)- Then when 
k<t<k + j-l, Vk{utUt-j) = ut-jVkUt and P^(n^n[_j) = u[__jV'f,u[. So by the 
Cauchy-Schwarz inequahty and ([9j), 

k+j-l CO 
\\Dj,k - D'j^,,\\4 < \\ut-jVkUt - Ut_jVkU[\\4 + \\VkiutUt-j) - Vk{UtUt_j)\\4^ 

t=k t=k+j 
k+j — 1 oo 
< ^ {\\ut^j - u't^jWs + \\VkUt - V'ku'tWs} + C Y htUt~j - UtUt_j\\4 
t=k t=k+j 
k+j — 1 oo 

t=k t=k+j 

<c{p''-nik>j) + \j-k\i{k<j)}. 

As to (c), applying the fact that K{Dj^i\ei, • • • , ei) = E,{Dj i\J^i), we get 
\\Dj,k-Dj4^ = \\Dj^i-D^^i\U = \\Dj^i-E{Djei,--- ,ei)\U 

= m{Dj,i-Dj,i)\mi<\\Dj,i-D'^^i\u 

< c{p'~n{i>j) + \j-i\i{i<j)}. 
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The proof is complete. 







Proof of Theorem\2^ Since i?„(0) = cj^ + Op{n-^/'^), we have 



Let G„ := n^Jj^^k'^jRKj), where ^^^(i) = n^-^ Ylt=\j\+i'^tUt-\j\- Note that 
Ru{j)-Ru{j) = u{{l - j /n)u - Y^^~^~^^ ut - Ylt=j+i'^t} for j > 1. Under 
GMC(2), n2 = 0,(n-i), X:™=i A^^^ElEL.+i + ES^' = 0(nm„). Conse- 
quently, nY^Y=i ^nji^uij) — Ru{j))'^ = Op(l). Then it suffices to show 



(2f78m„L'(K))l/2 



iV(0,l). 



(10) 



We shall approximate G„ by (?„ = EjlTi ^nj" ^ (Efc=j+i -^i.fe j • By the Cauchy- 
Schwarz inequality, 



n 



where the second term on the right hand side of the inequality is easily shown to 
be Op{mn) in view of the proof to be presented hereafter. As to the first term, we 
apply Lemma 15.11 and get 



-E 

n ^ 



k=j+i 



< 



c 



j=l h=l \k=h 

m-n oo / oo 



j=l /l=l \fc=fc 

„ m,„ oo / oo 



^ ^EE E(^4(fc) + 54(fc-i)i(^^>j)) 



j=l h=l \k=h 
oo 



fc=i fc=i /i=i j=i 



< Cml/n = o(l). 
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1/2 

So Gn = Gn + Op{mri ). Write 

rrin / " 

Gn = n-'Y^^l, Yl ^^^^ 

j=l \k=j+l 

van n vtin n k—1 

j=l k=j+l j=l k=j+2r=j+l 

Under the assumption that ut is GMC(8), it is easy to show that is GMC(4), 
which imphes that \cov{u^,uf_j)\ < Cr^ for some r G (0, 1). So by Lemma [5T2t 

rrin 

^(Gin) = ^ klj{n - j) + cov{uf,uf_j) + ^ cum(no, Uk,U-j,Uk-j) 
j=i \ k^o 

rrin 

= a^Y.^l^+0{l) = a^mnC{K) + 0{l), 
i=i 

where we have apphed the absolute summabihty of the 4-th joint cumulants under 
GMC(4) [Wu and Shao (2004), Proposition 2]. Let Dj^k = fc^fc, efc-i, • • • , ek-i+i] 
where I = In = 2m„. By Lemma 15.21 and the assumption that logn = o{mn), 

sup \\Dj^k — Dj^kWi = 0{n~'^) for any k > 0. (11) 

Write 



rrin 

= E A> + ^"'E^n. E {Dlk-Dlk) = Gun + Gi2n. 

j=l k=j+l j=l k=j+l 



where var(Giin) = Oimf^/n) = o{mn) by the /„-dependence of Dj k, and by (fTTj) . 

„ nin n 

l|Gl2n||<-E^n. E \\Dlk-Dlk\\=oil). 
j=l k=j+l 

So ([lO]) follows if we can show that G2ri/{2a^mnD{K)y/'^ -^d N{0, 1). 
Write 

m„ / 6m„ fc— 1 n m„+l n k—1 

G2n = 2n-^A.^,x EE+EE+E E 

j=l \k=j+2r=j+l k=6m„+lr=j+l fc=6m„+l r=fc-2i„+l 

n k—2ln \ 

+ E E Dj,kDj,r = Um + U2n + U^n + U^n- (12) 

fc=6m„+l r=m„+2 / 
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1/2 

We proceed to show that Ukn = Op{mn ), /c = 1, 2, 3. Note that the summands in 
Uin form martingale differences. So 



6m„ 



k=3 



(r-l)Am„ 



r=2 



0{m^^/n^) = o{mn). 



Regarding C/2„,, we let U2n = 2n ^ Y.T=i ^nj ELem^+i EJ^j+i Dj^kDj^r- It is easy 
to show that U2n — f^2n = Op(l) in view of Further, by Lemma |5.2[ 



n nir, 



rrin+l m„+l 



ml) 



fc,A:'=6m„+l j,j'=l r=j+l r'=j'+l 

n rrin m„+l 



4(1 + 0(1)) 



,3 



E E ^njklf E IE(A,fc^/,fc)IE(^,,r4',r 

fc=6m„+l j,j'=l r=(j+l)V(j'+l) 



= 0{m^/n) = o{mn). 

1 /2 

Thus C/2n = Op(m„ ). Concerning f/s^, since it is a martingale, we have 

In k—1 



ml) 



4 E 

^2 ^ 

fc=6m„+l 



E E 

j=l r=fc-2i„+l 



< 



< 



c 



C 



E E 

/c=6m„+l yi=i 

E E 



fe-i 

E ^J^f^^^^r 
r=k-2ln+l 

E D^^r 
r=k-2ln+l 



/c=6m„+l \ i=l 

Since Dj^s are martingale differences for each j, we apply Burkholder's inequality 
[HaU and Heyde (1980)] and get 

/ k-i 

<ci Yl ll^i'-lll <Cmi/\ 



k~l 

E 

r=k-2ln+l 



< c 



fc-1 

E "i. 

r=fc-2«„+l 



,r=fc-2/„+l 



Note that the constant C in the above display does not depend on j. So K{U^^) < 
Cml/n = o(m„). Let U^n = '^n-^T.T=l'^ljT.k=en^„+l^rZ'^!^Z+2Dj,kDj,r■ Since 
Uin - Uin = Op{l) by (HI]), it remains to show U^n/ {'^CF^rnnD{K)Y/'^ -^d iV(0, 1) 
in view of (1121). 



24 



Write = n-^ELem^+iKfc, where Kfc := 2 E'=S+2 E^i ^n.-^j.^A',- 
Then {I^a,-} forms a sequence of martmgale differences with respect to T^- By the 
martingale central limit theorem, it suffices to verify the following conditions: 



cj\n) ■.= nuln) 

n 

nvll{\Vnt\>ena{n))) 



t=6m„+l 



t=6m„+l 

By Lemma 15.21 and (jlip . we have 
a\n) 

4 



2a8m„D(A')(l + o(l)), (13) 
o{a'^{n)n^), e > 0, (14) 

1, where V;2=E(T/2|^,_,). (15) 



nk) 



fc=6m„+l 
fc=6mn+l r,r'=mn+2 j,j'=l 

A;=6m„+1 r=r?i„+2 jj'=l 

E E E ^"nifcni'E(^i,fc^i',fc)E(I),,.D,v) + 0(1) 

fe=6m„+l r=m„+2 j,j'=l 

E E E^'.-^(^l,'^)^(^l,^)(i 

fc=6»n„+l r=m„+2 jr=l 

2(TSm„D(i^) + o(m„). 



n 



(16) 



For again by Burkholder's inequality, we get 

4 



A; 2/ 71 fTT-n 



E(y, 



IE E E^«i^J>^^-.- <Cm3ElE E 



, r=m„+2 j=l 



i=l \r=m„+2 



2/71 



< Cm3EE(^l,,)E E A> <^<^'' 

i=l \r=m„+2 / 

which implies To show ([IS]), we let V;^ = n'^ X;"=6m„+i where 

r,T'=m„+2 j,j'=l 
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Then we can write 

. n t—2l„ m,i 

Vn-<rHn) = I E E E^^nA (17) 

t=6mn+l r,r'=mn+2 j,j'=l 

{¥.{[b,^tDr,t - Dj^tDj>,t]\rt-i)bj,rDr,r' 

5 

+E{Dj,tDf,tMDj,rbfy)} - a\n) =: Jkn - a^ii). 

k=l 

By a similar argument as in (fT6]) . J^n = o"^(n)(l + o(l)). So follows if we 
can show (T~^{n)Jkn = Op{l) for A; = 1, • • • ,4. By ([TT]) . Ji„ = Op{mn)- As to J2n, 
it follows from Lemma 15.21 and (jlip that uniformly in j, = 1, 2, • • • , m„, 

< 2\\Dj^iDj,^i - D'j iD'j, i\\ < Cp""" = 0(n-'^) for any k > 0. 

So J2n = Op{mn)- Lemmas l5.3l and I5.4l assert that Jsn = Op{mn) and J4n = Op{mn) 
respectively. Thus ([15]) holds and the conclusion follows. <C> 

Lemma 5.3. Under the assumptions in Theorem \2.1l the random variable J^n = 
as defined in I^17\ ) is Op{mn). 

Proof of LemmaESl Let M{j,j';t) = EiDj^tDj'^tlJ't-i+i) " ^i^j^tDj'^t) and 

•^3n = ;^ E E ''n,kl,,M{j,j';t)D^,rD,>,r'. 

t=6m„+l r,r'=m„+2 j,j'=l 

It is easy to see that J3„ = Jj,n+Op{mn) in view of (jlip . For notational convenience, 
denote by HD{j,t) = ^^.+2 ^i,r- and Hz{j,t) = ^1-^,^+2 Zjr- Write J^n = 
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Jzin + h2n + -^33n, where 

. n m„ 

f=6m„+l j,j'=l 
. n nin 

•^32^ = E Y ^nAM{j,j';t)Hz{j,t){HD{j',t)-Hz{f,t)), 

t=Gmn+l j,j'=l 
^ n m„ 

•^ssn = klAMU,f;t)Hz{j,t)HzU',t)- 

i=6m„+l j,j'=l 

We shall first prove J3i„ = Op{mn). Since M{j,j',t) is /„-dependent with respect 
t, we get by the Cauchy-Schwarz inequality, 

^(4n) < E E E Il^i.(jl,t)-i^z(jl,t)|l4 

t=6m„+l t'=(6m„+l)V(i-/„) ii,i;,i2,j2=l 

||i/^(j2,0-^z(j2,0||4ll^^(i^*)|l4 11^^02,0114- 

Since the summands in Hoij, t) form martingale differences, we apply Burkholder's 
inequality and obtain 

/ t-2ln \^ 

\\Hnij,t)\\t<CEl Yl ^Irj <Ct^ j = 1,2,- ■■ ,mn. (18) 

\r=m„+l / 

Applying Lemma l5.ll and the fact that Ss{k) < Cr^ for some r G (0, 1), we get 

1/2 

5m„ — 1 

E ®Im,4 

ki=l 

(\ 1/2 
f— 5m„ — 1 oo \ 
5^ i6s{h) + 6s{h - n)l{h > n)) < Cm^. (19) 
fci=i /i=fci / 

Therefore, in view of ([H]) and ((I9D, we obtain E(J|^„) < Cm^/n^ = o(rr?^. To 
show J32n = Op{mn), we note that 

t 2/77, 

\\Hz{j,t)\\i= ^ E(Zj>,Zj>.2Zj>3Zj>4) 

ri,r2,r2,r4=m„+l 

= E {cov(Zj>.^, Zj>2)cov(Zj>3, ^j>4) + cov(Zj>.^, Zj>3)cov(Zj>2, Zj>^) 

^l,''2,»'2,''4="ln + l 

+COv(Zj>-^, Zjr4^)cOv[Zjr2, -^jrg) + CUm(Zj>-^ , Zjr2, Zjr-j,, Zj^^^)}. (20) 
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Since {ut} are uncorrelated and the k-th {k = 2,3, ••• ,8) joint cumulants are 
absolutely summable under GMC(8) [see Wu and Shao (2004) Proposition 2], it 
is not hard to see that t)!!! < Ct^. Following the same argument as in the 

derivation of E,{J^i^), we can derive K{J^2n) — ^{''^n)^ so J32n = Op{mn)- 
It remains to show that J33„ = Op(m„). Note that 

„ n nh{t+ln) m„ t-2Z„ t'-2Z„ 

nJhj < ^ E E E E E 

i=6m„+l £' = (6m„+l)V{t'-«„) Ji,j(,j2,j2=l »'i.^'2=mn+2 ri,r^=mn+2 

^ n nh{t+ln) 
\K{Zj^r-iZj.^r2Zj[r[Zf^r'^)\ < ^ ^ ^ Hn{t,t'). 

t=6m„+l t' = {emn+l)V{t'-ln) 

Following ([20]) . we can write E(Zj^r^Zj2r-2-^j(r-i-^j^r-;,) ^ ^ sum of four components, 
which implies Hn{t,t') = Yl^=i^kn{t,t'). For Hin{t,t'), it follows from the abso- 
lute summability of the 4-th cumulant that 

m„ t-2l„ t'-2l„ 

Hln{t,t') = X] XI X] l{cOv(nr.i,Ur-2)cOv(Ur.i_ji,nr2-j2) 

jij'i,j2,j2=l r-i,r-2=m„+2 r[,r!2=m„+2 

-FCUm(Uri, Mri-ii, ■"r2, '"r2-j2)}{cOv(Uri> '"r^)cOv(u^^„j/ , U^/^.jV ) 

By the same argument, we have Hkn{t,t') < Crri^it V t')^, k = 2,2,. Regarding 
H^nit, t'), we apply the product theorem for the joint cumulants [Brillinger (1975)] 
and write 

cum{Zj^ri, Zj2r2, Zj'^r'-^i Zj^r'^) = cum{ui^ , ij £ vi) ■ ■ ■ cum{ui. , ij £Vp), 

V 

where the summation is over all indecomposable partitions v = viL) ■ ■ ■ UVp of the 
following two-way table 

ri ri - ji 
r2 r-2 - j2 
r[ r[-j[ 

Again by the absolute summability of fc-th [k = 2, • • • , 8) cumulants, we get 
H4n{t,t') < Cml{tyt'f. Therefore, E(J|3„) < Cml/n = o{ml) and J33„ = 
Op{mn). Thus the conclusion is established. 
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Lemma 5.4. Under the assumptions in Theorem \2.1l the random variable = 
defined in [T7\ ) is Op{mn)- 

Proof of Lemma [531 Write J^n = Jiin + J42n, where 

f =6m„+l r-,r'=m„+2 j,j'=l,j=^j' 
. n t—2l„ nin 

•^42n = E E E^nJ-^(^'<)[A.rA-,r'-IE(A,rA-,r')]- 

i=6m„+l r,r'=mn+2 j=l 

Note that 

n t\-2ln t2—2ln m„ m„ 

nJln) = o{n-^) E E E E E 

ti,t2=6m„+l ri,ri=m„+2 r2/4=m„+2 ji j(=l ji^^jj j2,j^=l,j2 5^J^ 

^ni2 ^Ij'^^i^h ,ti Dj[ ,ti )^{Dj2 ,t2 ,t2 ){coviDn > A2 ,r2 )cov (L)^. , i)^., ) 

By Lemma 15.21 and (jlip , the first two terms in the curly bracket above contribute 
0{mn)- Since cnm[Dj^^ri , Dj[.r[ 1 Dj2,r2, Dj/^y) vanishes when any two neighboring 
indices (say, {ri,r[), {r[,r2) and (r2,r2) if ri > r[ > r2 > are more than In 
apart, the third term is 0(/j^/n) = o(m^). So Jun = Op{mn). Concerning Ji2n, 
we have Ji2n = JA2in + JA22n, where 

, n t—2l„ m„ 

•^^^m = -2 E E EktM^ltmr-mlr)], 

t=6m„+l r=m„+2 j=l 
^ n t-2l„ r—1 nin 

•^422n = E E E E^njnDlt)[Dj,rD,y -nD,,rD,y)]. 

t=6m„+l r=m„+3 r'=m„+2 j=l 

Since is /^-dependent, we can easily derive E( J|2]^„) = 0{m'^/n), which implies 
-'421n = Op{mn). Let 

n. t-2Z„ r—1 m„ 

•^422n = ^ E E E Ef'njmDD.^rD.y. 

t=6m„+l r=m„+3 r'=m„+2 j=l 



29 



Then by ([n]), J422n-^422n = Op(l). Since for each j, {Er=m1+3 Er'=m„+2 Dj,rDj,r'} 

form martingale differences with respect to J^t_2i„ ) we get 



j=l \t=6mn+lr=mn+3r'=m„+2 

2 

Crun 



t~2ln r-1 



< 



Crrir 



rr 



■E E ^ E E D^^D^^r 



J=l i=6m„+l \ r-=m„+3 r'=m„+2 



m„ n t— 2i„ 

E E E ^ 

j=l t=6m„+l r=m„ +3 



r-1 



E ^^■^ 



, r'=m„+2 



where we have apphed the fact that for each j, {X]rim„+2 ^j,rDjy} is a sequence 
of martingale differences with respect to J^r- By the Cauchy-Schwarz inequality 
and Burkholder's inequality, 



E 



r-l 

Dlr{ E 

r'=m„+2 







r-l 


2 


)] 


< c 


E 


< C(r -mn-2) 






r'=m„+2 


4 



Thus IE(J|22n) — Crri^/n = o{mf^), in other words, Ji22n = Op{mn)- The proof is 
complete. 





5.2 Appendix B 

Throughout the appendix B, we let ut{0) = Xlfclo ^k{6)yt-k and iit = X^l^o ekidn)Yt-k, 
t = 1, 2, • • • ,71. Write nj = nt+A„,t, where A„t = Xu+X2nt, Ait = - Efclf efc(6'o)lt-fc = 
Y^'kLo^k,tU-k and A2nt = Efc=o(^fc('^") - efc(6'o))^t-fc- Denote by ek-mi{0) = 
dek{0)/d9mi and (9) = d'^ek{6)/d9midem2 for any mi, m2 G {1,2, • • • ,p+ 

q + 1} and assume they are the same as those expressions in Lemma 15.71 without 
loss of generality. 

Lemma 5.5. Under the assumptions in Theorem \3.2l we have (a), n X]™?i k'^jP\{j) = 
na-'Y.7=ik%Rl{j)+o,{m}J'). and (h). nY.J=,kl^{Rl{j) - Rl{j)) = Ov{rn]l') , 
where Ru{j) = EtL|j|+i utUt-\j\. 
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Proof of Lemma 15.51 To prove (a) , it suffices to show that 



2 



i?^(0)=n-ij^n?- n-iJ]nJ = + 0^{n-^'^). (21) 
t=i \ t=i J 

To this end, let Gi„, = Ylt=i "Ait, G2„ = n"^ Y17=i ^it and G-sn = n"^ XlLi ^2nf 
Since n"! ^[Li = Opin'^/^), ^ follows if we can show Gi„ = Op(n-i/2), 

G2n = Op(n"V2) j^j^^ (^^^ ^ Op(n~i). Note that 

n oo 

t,t'=l k,k'=0 
n oo n. oo 

= "-"^ X] V'fc.tO-'^ + X] Tpk,ttl^k',t'cum{ut,ut',u_k,u-k') 

t=l k=0 t,t'=l k,k'=0 

= 0(log n/n^ + n~i) = 0(n~^), 

where we have applied the fact that YlT=o'^kt ~ 0{t~^) [cf. Robinson (2005)] 
and the absolute summability of the 4-th cumulants. Since E(G2n) = 0(logn/n), 
G2n = Op(n~^/^). To show G^n = Op{n~^), we apply the mean-value theorem and 
get ek{en)-ek{eo) = EC=V efc;mi (4n)(^^'^ -^^'^ wherein = Oo+Pk{0n-Oo) 
for some (3k £ (0, !)• Then 

n t-1 

nG-in = (efc(^n) - efc(6'o))(efc/((9rt) - efc'(6'o))yt-fcFt-fc' 

t=l fc,A:'=0 
n t-1 p+q+1 

= Y.Y. E (^^^^-^^'^)(^^'^^-^^'^Vfe;-l(^fcn)efe.^i(4'n)>^^-fc>^^^^^ 

t=l fc,fc'=Omi,m'i=l 

When 6'„ G G^, by Lemma [5771 for any [nii^ni'i) G {1, • • • ,p + q + 1}^, 

n t-1 

X] X] |eA.-;mi(4n)||efc';m',(4'n)|E|yf_fcyf_fc'| = ©(n). 
t=l fc,fc'=0 

Since On — 9q = Op(n~-^/^), we have P{On ^ O5) — > 0. Consequently nGsn = 
nG^n^iOn G O^) + nG^n^iSn ^ ©5) = Op(l). Therefore part (a) is proved. 

As to part (b), write Ru{j) - Ru{j) = -n~^u {YJi=i + EILj+i ^t) + (1 " 
j /n)u^ , where u = XltLi ^t- Following the argument for part (a), it is straight- 
forward to show that u = Op{n~^l'^) and YlJ=i^njiYl^=i + I]"=j+i = 
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Op{nmn). So nYJJ-^i klj{Ru{j) - Ru{j))'^ = Op(l). Applying the Cauchy-Schwarz 
inequality, part (b) follows. 
Proof of Theorem \3.S^ By Lemma 15. 5| we only need to show that 

{2a^m^D{K)y/^ ^i.A^(0,l). 

Note that Rlii) - Rl{j) = {Ru{i) - Ru{i)? + 2Ru{j){Ruij) " Mj))- By Theo- 
rem 12.11 it suffices to show 

m„ 

nY,kl,{Ru{3)-Ru{j)? =Oj,{l), 
i=i 

since it implies n X^™!"]^ k'^jRu{j){Ru{j)—Ru{j)) = Op{ml/'^) by the Cauchy-Schwarz 
inequality. To this end, we note that 

rrin ^ m„ ( / n \^ f 



2- 



n . , , , 

j=l j=l \t=j+l j \i=j+l 

n \ f ^ \ f ^ 

^t=j+l J \t=j+l J \t=j+l 

= '■ C{Lin + L2n + Lsn + L^n + -^^5n)- 

We proceed to show that Lfc„ = Op{l), k = 1, ■ ■ ■ ,5. First, 

rrin / n oo 

,tU-kUt~j 

j=l \t=j+l k=0 

m.n n oo oo 

j=l t,t'=j+l k=0 k'=0 
m.n n oo oo 

= '^^^E E E i^k,ti>k',t'{cov{u^k-,u^k')cav{ut^j,ut'^j) 

j=lt,t'=j+lk=Ok'=0 

+cum('u_fc, u_k',ut-j,ut'-~j)}, 

where the first term above is (c^/^^) X^j^i X]"=j-f.i Z^fcLo V'fc j = 0(?Ti„ log n/n). 
Applying Proposition 2 in Wu and Shao (2004), we have \cum{u-k, U-k' , ut-j , Uf < 
(j^tvt'-j+kvk' £qj, gQjQg J, g (^Q^ ]^^_ gQ f^i^Q second term in E(Li„) is bounded by 

TUn n OO OO 

E E E E = 0{mjn). 

j=l t,t'=j+l k=0 k'=0 
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Following the same argument, we get E(L3„) = 0{mn/n) = o(l). 
To show L^n = Op{l), we note that 

+ j ^ ^2nt^2n{t~j) j / =: C'(-^^51n + -^^52n + -^53n + -^^54n) • (22) 

As to Lsin, we have 



i=i t,t'=j+i 



^ rrin n oo 

— fcl^— fc2^ — A:3^ — fc4 J 

j=l t,t'=j+l ki,k2,k3M=0 
^ r?i„ n oo 

X] X] ^fci,tV'fc2,t'V'fc3,t-iV'fc4,i'-i{cov(n 



n 

j=l t,t'=j+l ki,k2,k3,k4=0 
COv(u_fc3,M„fcJ + COv(u_fcj,U„fc3)cOv('U_fc2,U_fcJ 

+cov(n_fc, , n_fc Jcov('u_fc2 , n_fc3 ) + cum{u.ki , ''^-fca > ^-fcs ' ''^-fc4 )} • 

Since X^fc^o^fet — Ct~^ [cf. Robinson (2005)], the first three terms above are 
0(m.„ log^ n/n) under the null hypothesis. By Proposition 2 in Wu and Shao 
(2004), |cum(n„fc^,M_fc2,u_fc3,u_fcJ| < Cr'^^^('=i''=2'*^3,fc4)-min(fci,fc2,fc3,fc4) foj. gome 

r G (0, 1). Thus the fourth term above is bounded by 

^ nin n oo 

~YY^ IV'fci,tV'fc2,t'V'fc3,t-i^fc4,i'-jk^'"''* 

j=l t,t'=j+l ki>k2>k3>k4,=0 

^ mn n oo oo oo 

-~Y1 Y Y Y \'^k2+hi,ti^k2,t'\ Y \^ki+hz,t-j^ka'~3V^^^^^ 

j=lt,t'=j+lhi,h3=0k2=0 fc4=0 
^ mn n oo 

j=lt,t'=j+l hi,h3=0 

Lemma [5.6l asserts that ^52^ = Op{l) and the same argument leads to ^53^ = Op(l). 
Following the line as in the derivation of G^n (see Lemma I5.5p . we can derive 
L^4n = Op{mn/n) = Op(l). Thus L^n = Op(l) and a similar and simpler argument 
yields Lfc„ = Op(l), A; = 2,4. We omit the details. The conclusion is established. 
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Lemma 5.6. Under the assumptions in Theorem the random variable L^2n = 
n"^ YlT=i (T^t=j+i ^it^2n(t-i)) as defined in ^ is Op(l). 

Proof of Lemma 15.61 We apply a Taylor's expansion for each k and obtain 

p+q+l 
mi=l 

p+q+l 
mi ,m2=l 

where 9kn = Oq + afc(^n — ^o) for some £ (0, 1). By Lemma \57l\ \ek;mi{9o)\ < 
Ck~^~'^ and supgge^ \^k\{rm,m2){9)\ ^ Ck~^~'^ for some e > 0. Denote by ek{Oo) = 
Cfc and efc;mi(6'o) = ek-mi- Since eo(6') = 1, we have 



L^2n = — E E ^lti^U2^2n{ti-j)^2n[t2-j) 
j=lh,t2=j+l 
^ rUn n oo ti—j — lt2—j — l 

-EE E E E V'fci,tiV'te, 



n 

i = l tl,t2=j + lfcl,fc2=0 fc3 = l A;4 = i 

{ek30n) - e.kz{0o)){(iki{0n) - ek^{Oo))Yt^^j^k-^Yt,^^j^ki 



"EE E E E V'fcl,tlV'fc2,t2^^-fcl^-fc2^tl-i-fc3^t2-j-fc4 

j=lti,t2=j + lfci,fc2=0 k-i = l k4, = l 

p+q+l p+q+l 

E (^n"'^^ ~ ^o"'^^)^fc3;mi + E (^n™^^ ~ ^o"'^^)^fe3;(mi,m2) (^^3") 
mi=l mi,m2=l 

(p+q+l p+q+l 
Y.{e[r^-9^r^)ek,;n.,+ E (^^^^-^^^) 
m3=l m3,m4=l 

4 
/i=l 

Write = YlT=o^kUt-k- To show L52in = Op(l), it suffices to show that for any 
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(7711, 7713) e {!,••• ,P + q+ 1}^, 

m„ n oo ti—j — lt2—j — l 

fifes ;mi ^k4;m3'^—kiU—k2 

j = l tl,t2=j+l ki,k2=0 k3 = l fc4 = l 

m„ n oo ti~j~lt2~-j-~l oo 

^tl-i-fc3^<2-i-fc4 = X X X X X X V'fcl,tlV'fc2,t2 

i = l tl,t2=i+lfcl,fc2=0 fc3 = l fc4=l /ll,/i2=0 

O/ii fl/i2 fifc3;mi Cfc4;m3'U'— fci fc2^ti~i~fc3~^i ^t2~i~fc4~^2 — Op(7l ). 

Note that 

m„ n n oo ti-j-l t2-i-l 

E(iLJ=E EE E E E E 

jj' = ltl,t2=i+lt'j,ti,=j'+lfci,fc2,fci,fc^,=0 fc3 = l fc4 = l fc;j = i 
i2-i'-l oo 

E E '^kiM'^k2,t2'^k[,t['4^k'2 ,t'2 ^hi O/i'j^ O^h,^ 6^3 ;mi 6fc4 ;m,3 ^k'^]m\ ^k'^;m3 

k'^=l hi,h2,h'i,h'2=0 

E{U-kiU-k2Uti-j-k3-hiUt2-j-k4-h2'>J'-k[U_k'2'>J't[-j'-k'^-h[Ut'^-j'-k'^-h'2) 
rrin n n 00 00 00 

E E E E E 

jj'=l ti,t2=j+lt[,t'2=j'+l fci,fc2,fci,^=0 k3,k4,k'^,k'^=l hi,h2,h[,h'2=0 

I V'fci ,ti V'fe ,t2 1 1 V'fci i^k'^ ,1'^ 1 1 a/ii ah2 1 1 Oh'i O/iJ, 1 1 ^3 ^^3^4 r ^"'11, 

where 

II = |]E(u_fc^M_fc2'"tl-j-/C3-/ll^^t2-j-fe4-/l2"-fci'"-fc^,^t;-j'-fe^-/li"t^-i'^ 

= ^ cum('Ui^ , «i G 51) • • • cum(ni^. , ij G ^p) . 

In the above equation, is over all partitions g = {giU ■ ■ ■ U Qp} of the index set 
{-ki,ti-j-k3-hi, -k[,t[-j'-k'^-h[, -k2,t2-j-k4-h2, -/cs, ^2-/-^4-^2}• 
Since = 0, only partitions g with ^gi > 1 for all i contribute. We shall divide 
all contributing partitions into the following several types and treat them one by 
one. 

1- #51 = #52 = #53 = #54 = 2. One such term is 

cov{u^ki,ni-j-k3-hi)cov{u^k[,Ut'^~j'~ki,~h[)coviu_k2,ut2-j-ki-h2) 
xcov(u_fc/,nt/_jV_fc/^_^,), 
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which is nonzero when —ki = ti — j — — hi, —k[ = t'^ — j' — k'^ — h'^, 
—k2 = t2 — j — k/i — h2 and — ^2 = ^2 ~ i' — k'^ — ^2- Define a/j = if /i < 0. 
Then for any fixed g £ Z, Xlh^o \ahah+g\ < E/^o^h ■= < oo. For any 
fixed ti, t'i,t2,t2, j, j' , ks, ki, k'^, k'^, by the Cauchy-Schwarz inequality, 

oo 

\'<Pkuti1pk2,t2\\'4'k[,t['^k'2,t'J\0'ki+ti -j-kz a-k2 +t2 -j-ki I 

ki,k2,k[,k'^=Q 

/ oo ^ ^ ^ ^ V'' ^ 

|afc/+t/_j/_fc/Ofc/+j/__j/_fc/| < ^ ^ki,h''Pk2,t2^k[,t[^k'2,t'2 I 'S'a 

\fci,fc2,fei,fc^=0 / 

= 0((tlt2t'lt'2)"'/')- 

So this term is 0{nn?^v?) = o{n^). Similarly, all non-vanishing terms involve 
four restrictions on the indices ki,k2,k[,k2,hi,h2, h'^, h'2 once we fix ti , t'^, 
^2) ^2' ii i'; ^3; ^4' ^3' ^4- '^^^ Contribution from these terms are of order 
o{n^). 

2- #51 = #92 = 3, #53 = 2. A typical term is 

cum(n_fc, , -j-ks-hi , U-k[ )cum(nt/ , u_k2 , Ut2-j-ki-h2) 

xcov(u_fc/,nt^__j,_fc/„/,/^). 

So for any fixed ti,t'i,t2,t2,j,j',ks,k4,k(^,k'i, 

oo 

X] IV'fci,ti'0fci,t'iO,jJ|cum('u_fcj,ntj_j_fc3_,,^,'u_fc/)| (23) 

ki,k[,hi=0 
oo 

Consider the case —k[ > —ki > ti — j — k^ — hi. Then the corresponding 
term above is 

oo 

si,S2,ki=0 

where we have applied the Cauchy-Schwarz inequality and the fact that 
YlT=o t ~ Oit~^). Other cases can be treated in a similar fashion. So (p3]) 
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is 0{{tit'^) "^/^). Similarly, we can show that 

oo 

and 



-1/2^ 

W . Ui—L'^ . Uj+ i J I — I 60 



Thus these terms contribute 0{m^n ) = o(n ). 

3. #51 = #92 = 4; #51 = 4, #^72 = #53 = 2; #51 = 5, #52 = 3; #51 = 
6, #52 = 2 and #51 = 8. Following a similar argument as the second case, 
it is not hard to see that the contribution of all these terms are o(n^). 

So L^2in = Op{l). Under the assumption that ut is GMC(8), it is not hard to 
show that E(y/) < 00, and sup^gjs^EAl^ < 00; compare the derivation of E(L5i„) 
in the proof of Theorem 13. 2[ Together with Lemma 15.71 we have E|L522n|l(^n £ 
@s) = 0{mn/v}/'^) = 0(1), so L522n = Op{l). Similarly we derive L^2kn = Op(l), 
A; = 3, 4. Now the proof is complete. 

The following lemma is an extension of Lemma A.l of Francq and Zakolan 
(2000) to the FARIMA model. 

Lemma 5.7. For any 9 £ Qs one? any (mi, 7722) G {I,-"" 5^ + 0'+ 1}^; there ex- 
ist absolutely summable sequences {eki9))k>o, (efc;mi (6'))fc>i and (efc;(mi,m2)(^))fc>i 
such that almost surely 

ut{e) = f2ekie)Yt.k, = f2ek-m,ie)Yt.k 

k=0 k=l 



and 



X ^fc;(mi,m2)(^)^i-fc 



Further, there exists an e > 0, such that 

sup \ek{e)\ = 0{k~^~'), sup \ek;m,m = and 



sup \ek-{muni2)(^)\ = 



-l-e\ 
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Proof of Lemma O Letting Xt = {1 - BfYt, then (l)A{B)Xt = 'ipA{B)ut. By 
Lemma A.l in Francq and Zakolan (2000), there exist sequences (cfc(A)),fc>o, 
(cfc;mi(A))fc>i and ick-(mi,m2)W)k>i such that 

oo oo 
j=0 j=l 

and 

oo 

d^utiA)/dAm,dA^,=J2 

[A)Xt.,. 

Further, there exists a r G [0, 1), such that 

sup |cj(A)| = O(r^), sup |cj;mi(A)| = O(r^), sup |cj-.(mi,m2)(A)| = 0{r^). 

Ae^s Ae^i Aesi^ 

Note that Xt = Y.T=q Md)Yt-s, where (/>s((i) = T{s - d)/{r{-d)r{s + 1)}. There- 
fore, we get ut{6) = Y^'^=o^k{0)Yt-k, where ek{9) = T.'j=o<^ji^)4'k-j{d). The 
conclusion fohows from the definition of @s and the fact that do G (0, 1/2). 
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